arXiv:1504.03740v2 [quant-ph] 30 Aug 2015 


Cost of postselection in decision theory 


Joshua Combes^’and Christopher Ferrie^’® 


^ Center for Quantum Information and Control, University of New Mexico, Albuquerque, New Mexico, 87131-0001 
^Centre for Engineered Quantum Systems, School of Mathematics and Physics, 

The University of Queensland, St Lucia, QLD 4 072, Australia 
^Institute for Quantum Computing, University of Waterloo, Ontario N2L 3G1, Canada 
'^Perimeter Institute for Theoretical Physics, 31 Caroline St. N, Waterloo, Ontario, Canada N2L 2Y5 
^Centre for Engineered Quantum Systems, School of Physics, 

The University of Sydney, Sydney, NSW, Australia 
(Dated: September 1, 2015) 

Postselecti^ms the process of discarding outcomes from statistical trials that are not the event one 
desires. PostseKtion can be useful in many applications where the cost of getting the wrong event 
is implicitly higl. However, unless this cost is specified exactly, one might conclude that discarding 
all data is optimal. Here we analyze the optimal decision rules and quantum measurements in a 
decision theoretic setting where a pre-specified cost is assigned to discarding data. Our scheme 
interpolates between unambiguous state discrimination (when the cost of postselection is zero) and 
a minimum error measurement (when the cost of postselection is maximal). We also relate our 
formulation to previous approaches which focus on minimizing the probability of indecision. 


I. INTRODUCTION 

There has been some confusion over the role of post¬ 
selection in quantum information processing protocols. 
On one hand, postselection is a powerful computational 
resource [1] and enables technological goals, such as prob¬ 
abilistic photon-photon gates [2]. On the other hand, in 
some situations postselection can impede quantum infor¬ 
mation processing. 

Probabilistic metrology—also known as metrology 
with abstention [3] and weak value amplification [4]— 
is the idea that postselection may improve estimation 
precision beyond the usual quantum limits. When the 
performance of probabilistic metrology is evaluated with 
respect to the standard figure of merit for parameter es¬ 
timation, mean squared error, postselection is provably 
suboptimal, even when there are imperfections [5-12]. 
Counter claims have been made in the literature (see 
Refs. [13-17]) but the issue is far from settled. 

In this article we attempt to reconcile the intuition 
that postselection can help statistical tasks with the fact 
that for the standard figures of merit generically it does 
not. To simplify the analysis and make our assumptions 
explicit we will use a statistical decision theory approach 
in the context of quantum state discrimination [18, 19]. 
To assert that a state discrimination protocol is optimal, 
we must first specify a cost or loss function which encap¬ 
sulates how each decision is penalized. Then we minimize 
the average loss over decision rules and measurements. 

This approach defines a task for which the optimal 
protocol incurs the least losses for the specified loss func¬ 
tion. For example consider a two party discrimination 
game involving an employer Alice and an employee Bob. 
Alice gives Bob one of two quantum states Ti or 'I' 2 . 
Bob is allowed to perform any generalized measurement 
on the state but then must report which state Alice gave 
him; he cannot decline to report a state. Bob’s bonus. 


of at most D dollars, is tied to his performance in this 
game. If he reports when dtj is true his bonus will 
be reduced to $(1 — Aij)D where Xij is called the loss 
function. Bob wants to devise a strategy to minimise his 
expected losses. When the cost of reporting the correct 
answer is “0” and the incorrect answer is “I” or maximal, 
Aij is known as the 0-1 loss function. Mimimising the 
losses from the 0-1 loss function is equivalent to minimiz¬ 
ing the probability of misidentifying the states (termed 
the error probability) [20, 21]. The corresponding opti¬ 
mal measurement strategy, with respect to minimizing 
losses, is called the Helstrom [20] or minimum error mea¬ 
surement. A postselected strategy will have higher ex¬ 
pected losses, that is it is suboptimal with respect to the 
0-1 loss function. 

Postselected strategies for state discrimination were 
introduced by Ivanovic [22], Dieks [23], and Peres [24] 
in what is now known as unambiguous state discrimi¬ 
nation (USD). In USD one allows for an extra “reject” 
decision—postselection—then two nonorthogonal states 
can be distinguished without error, albeit probabilisti¬ 
cally. The USD measurement is optimized in the sense 
that it has minimal probability of reporting the incon¬ 
clusive result “reject”. Prior work on inconclusive state 
discrimination has focused on exploring and optimizing 
schemes which interpolate between minimum error prob¬ 
ability and minimum inconclusive result probability [25- 
31]. Typically in USD and its generalizations [32] there 
is no explicit penalty for reporting “reject”. It is unclear 
if such postselection is optimal with respect to any loss 
function. 

Here we re-formalize the inconclusive state discrimina¬ 
tion problem by assigning a cost to discarded outcomes. 
In particular, we modify the most commonly used cost, 
the 0-1 loss function, to what we call the 0-1-A loss func¬ 
tion. In the 0-1-A loss function, A is the cost of reporting 
“reject”. In our approach, we find that the USD mea¬ 
surement appears when A —> 0. In this limit there is an 
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alternative protocol which is equally optimal: always re¬ 
port “reject”. Finally we show how our results can be 
connected to previous approaches where there is a trade¬ 
off between the rejection probablity and the error prob- 
ablity [25-31]. Our analysis adheres to the desiderata 
suggested in Ref. [9], and thus is a definitive case where 
employing postselection can be said to be optimal. 


II. STATISTICAL DECISION THEORY 

We start by reviewing statistical decision theory and 
formally introducing the 0-1-A loss function, which is a 
special case of Chow’s work on hypothesis testing or clas¬ 
sification [33, 34]. Consider a set of competing hypothe¬ 
ses 'Hj for j e {1, 2,..., n} with prior probabilities Pr('Hj). 
Given some data D the posterior probability of the j’th 
hypothesis is 


Pr(H,]D) 


Pr(D|H,-)Pr(H,-) 

Pr(D) 


where 


Pr(D)=^Pr(D|H,)Pr(H,). 

1=1 


( 1 ) 

( 2 ) 


What we would like to do is have a decision rule 5(D) 
that maps the data D to decision i —that is, report hy¬ 
pothesis i, where in this case i S {0,1, 2..., n}. The deci¬ 
sion i = 0 allows for the possibility that one may not be 
able to decide, often referred to as the “don’t know” or 
“abstain” or “reject” option. 

In Bayesian decision theory the decision rule must arise 
from minimizing a loss function, which encapsulates how 
each decision is penalized. The conditional risk, i.e. the 
a posteriori expected loss, for the decision i conditioned 
on data D is 


7^[^]D]=^ A,,,Pr(H,lD), (3) 

1=1 

where the loss function is denoted by A^j which corre¬ 
sponds to reporting hypothesis i when hypothesis j is 
true. The loss function Xij is a good place to start build¬ 
ing intuitions for the role of postselection in detection and 
estimation theory. 

Following Chow, we will require that 

Aqi ^ '^0,1 ^ ^i,j id j 7^ 0); (4) 

which is interpreted as the loss for making a correct de¬ 
cision {i ^ 0) is less than the cost of reject a decision 
Aoj which is less than the cost of making a wrong deci¬ 
sion Aij-. We relax this assumption in Sec. V, such that 
Ao,j > Aij- is possible. A good description of the mathe¬ 
matical and philosophical requirements of a loss function 
can be found in chapter 2 of Ref. [35] . 



FIG. 1: The Bloch representation of the states and POVM 
elements involved in the state discrimination protocol. The 
POVM elements, are not mixed states, but subnor¬ 

malized rank-1 operators, which lie on a circle at a lower level 
in a cone of positive operators. The grey lines on the left 
figure are the arc of the POVM elements as (f) is varied in 
Eq. (19) from 0 to 7r/2. The right figure is illustrates two 
special cases of the POVM elements When (f> = n{2 

there are only two POVM elements and the measurement is 
the Helstrom measurement. When (p = 6 we recover the USD 
measurement. 


The optimal decision is 

5*(D) = argmin7?.[z]D]. (5) 

i 

When we turn our attention to quantum hypothesis test¬ 
ing we will need to determine the optimal measurement 
to pair with this optimal decision rule. The criterion for 
optimal we adopt will require us to minimize the average 
of the posterior risk 

7^]5(D)] = ^ 5] A,(d),, Pr(H,|D) Pr(D), (6a) 
D i 

= EE '^(5(D),j Pr(D['Hj) Pr('Hj), (6b) 

D ] 

over the distribution of data and the measurement. 
When we assume the optimal decision is being used we 
denote the total risk as TZ* = 7?.[5*(D)]. 

To simplify or analysis we will consider binary hypoth¬ 
esis testing (i.e ELi vs 'H 2 ) and take 

Al,l = A2,2 = 0, 

Ai,2 = A2,1 = 1, (7) 

Ao,i = Ao ,2 = A, 

which we call the “0-1-A” loss function. For the 0-1-A 
loss function the conditional risks for decisions i are 

7^[2]D] =l-Pr(H 2 |D), 

7^[1|D] =l-Pr(Hi|D), (8) 

7^[ojD] =A, 

where we have used Pr('Hi|D) + Pr('H 2 |D) = 1. 

Thus our decision rule 5*(D) is 

r2 if 7^[21D] < 7^[1|D] and 7^[0|D] 

5*(D) = i 1 if iR.]l|D] < 7^[21D] and 7^]0|D] . (9) 

0 otherwise 
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With respect to the posterior probabilities we find 

(2 if PrCH 2 |D) > 1-A and PrCHilD) 
^*(D) = i 1 if Pr(Hi|D) > 1 - A and PrCHsID) . 

I 0 otherwise 

( 10 ) 

In words, the decision rule is as follows: find the largest 
posterior probability; if it is greater than or equal to the 
threshold 1 —A, report it; if it is less than 1 —A report “re¬ 
ject” . Now we connect this decision theoretic framework 
to quantum hypothesis testing. 


III. STATE DISCRIMINATION 

In quantum theory the statistics of measurements 
are described by a positive operator valued measure 
(POVM) {ifoli the elements of which sum to the iden¬ 
tity: -Ed = I . The number of elements of a POVM 

is the number of outcomes of the measurement. To match 
this with our previous terminology the outcomes of the 
measurement are the data D. In order to encompass 
both USD and Helstrom measurements we must consider 
a three-outcome POVM Ed where D € {0,1, 2}. Let us 
make the following symmetry assumptions to make the 
discussion less cumbersome: 


Recall from Eq. (7) that Ao,i = A. Using this and Bayes 
rule we obtain 

=2[APr(Hi|Uo)Pr(£;o) + A5.(i),iPr(Hi|Ui)Pr(Ui) 
+ Xs^(2),iPr(?iilE2)Pr(E2)]. (14) 

Then using Pr(ifo) = I —Pr(ifi) —Pr(£' 2 ) = 1 —2Pr(ifi) 
we have 

7^* =2{iA[l-2Pr(£;i)]-t (15) 

[A5.(i),i Pr(Hi|Ui) + A5 .(i). 2 Pr(H 2 |Ei)] Pr(Ui)}, 

where we have used Pr('Hi|if 2 ) = Pr('H 2 |ifi) and 
Eq. (lib). The term T = [A 5 .(i)^i Pr('Hi|£’i) -|- 

A 5 *( 2 ),i Pr('Hi|£' 2 )] still depends on the optimal deci¬ 
sion rule so we must explictly use it. It is important 
to note that we can’t assume (5*(1) = 1, this means 
we must consider two cases (^*(1) = 2 is obviously 
ruled out by symmetry): (1) (5*(1) = 0: this implies 
T = X[Pi{ni\Ei) + Pi{n 2 \Ei)] = A; or (2) S{1) = 1: this 
implies T = Py{'H 2 \Ei). Using the optimal decision rule, 
the risk becomes 

APr(E;o|'H2) + Pr(E;i|-H2) if PrCH2|Ei) < A 
A otherwise 

(16) 

Equivalently this can be written as 


Pr(Hi) =Pr(H 2 ), (11a) 

Pr(E;i) =Pr(U2), (11b) 

PTiEi\ni)=PT{E 2 \n 2 ), (11c) 

Pr(E;i|H2) =Pr(U2|Hi), (lid) 

Pr(E;o|Hi) =Pr(Uo|H 2 ). (He) 


These symmetries are implied, for example, by the states 
and operators in Eig. 1. 

Utilizing some of these these symmetries the total risk 
in Eq. (6b) becomes 

^ =^(^< 5 ( 0 ),! + ^ 5 ( 0 ), 2 ) P'^{Eq[Hi) + 

(^5(1),! -I- Xs( 2 ), 2 ) Pr(ifi|'Hl)-|- 
(^5(2),! + A5(i)_ 2) Pr(-£'2|'Hi)]. (12) 

Next we use the optimal decision rule, Eq. (9) or Eq. (10), 
and more of the symmetries to massage this expression. 
Further, we assume that A < 1/2; as for A > 1/2 one can 
always randomly choose to report "Hi or 'H 2 and reduce 
the expected risk (in Sec. V we will relax this assump¬ 
tion). Equation (lie) implies Pr(Hi|£’o) = P"‘^{'H 2 \Eo) = 
1/2, thus the lowest conditional risk i.e. Eq. (8) implies 
that the optimal decision for D = 0 is (5*(0) = 0 always. 
Also Aaqi)^ = \s*( 2),2 and A 5 .( 2 ),i = A 5 .(i ),2 are implied 
by symmetry as well. Using these relations we obtain 

n* =Ao.i Pr(EolHi) + A5 .(i),i Pv{E^\H^)+ 

A5.(2).iPr(E2|Hi). (13) 


= A-kmin{0,Pr(Ui|-H2)-A[l-Pr(£;o|'H2)]} (17) 

The above risk is true for the 0-1-A loss function and any 
two hypotheses and measurements satisfying the symme¬ 
try conditions. The first term represents the part of the 
expected risk when a rejection is made. The second term 
is not yet optimized over the possible measurements. 

As a specihc example, here we will consider the prob¬ 
lem of discriminating the following two quantum states: 


"Hi: I'I'i) = cos I |0)-I-sin I |1), (18a) 

'H 2 : |'I' 2 ) = cos I |0) - sin I |1), (I8b) 

where 0 < 0 < tt/2, \ (d> 2 | 'I'l) | = cos 6* and the prior 
probabilities are Pr('Hi) = Pr('H 2 ) = 1/2. 

The symmetry we imposed in Eq. (11), imply the 
measurement is in fact a generalized measurement with 
POVM elements 


E2{cj)) 

Ui(0) 

Uo(0) 


f OJ.Xi 2 OAAJ. 2 2 

2 cos2 I V - sin I cos f cos^ f 

1 / sin^ ^ sin | cos ^ 

2 cos2 I V sin I cos | cos^ | 

/ 1 — tan^ 9 0 \ 

I 0 0 j’ 


(19) 


such that E2{4>) + Ei{(j)) + Eo((/) = I. When (f> = Trj2 
we get Eq = 0,Ei = \+){+\,E 2 = |-)(-| (where |±) 
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FIG. 2: Expected risk TZ (row 1) and decision rule (row 2) for the 0-1-A loss function. In all figures the abscissa is 4> (the 
measurement angle) and the ordinate is A (the cost of reporting “reject” ). The dark black line is the minimum risk (7?.*[<()*]) 
for a given A and thns specifies the optimal measurement angle. The shaded regions in the second row are simply the region 
for which the expected risk is less than A; in this region one always reports i if one obtained outcome Ei. 


are the eigenstates of the Pauli X operator), which is the 
Helstrom measurement for all 9. When (p = 9 we obtain 
the USD measurement for all 9. In Fig. 1 the grey lines 
are the arc traced by Eq. (19) as a function of (p. Note 
that for 0 > 7r/2 the POVM element Eq is not a positive 
operator, thus we do not allow these values of (p. 

To apply the above decision theoretic formalism we 
need to compute the probabilities given in Eq. (17). All 
of these probabilities can be computed using the usual 
rule: 


Pr{Eo\m,cP) = {'i>,\Eo{^)\-^,), ( 20 ) 

see footnote [36] for some examples. Notice how all of 
the probabilities depend on the measurement angle (p, 
this means the expected risk will also be a function of (p. 

Given the POVM elements in Eq. (19) the expected 
risk is 


n*[(p] =A- 


( 21 ) 


0 , 


(2A — l)(cos0cos(^ — 1) — sin 0 sin 0 
2(1 + cos^j 


Intuitively this says the risk is at most A and sometimes 
less. This risk is plotted in Fig. 2 as a function of A and 
(p for particular values of 9. To find the optimal angle 
we fix A and ask which cp minimizes TZ*[(p]. This can 


be done analytically. The trival case is when TZ* \(p] = A 
an thus no optimization over (p is possible. The optimal 
measurement found by solving 


d (2A — l)(cos0cos(() — 1) 

d(p _ 2(1 + cos(p) 


sin 9 sin (p 


= 0 , 
( 22 ) 


for <p. The constraint on the positivity of the mea¬ 
surement operators, i.e. (p < 7r/2, results in following 
peicewise dehntion of optimal measurement angle 


f 2 cot ^ 

0' 

(1 — 2A) cot - 

ifA<l 

(^l-tan0 


TT 

if A > - 

1 - tan 

[ 

2 

“ 2 

1 2 ; 


(23) 


This optimal angle is plotted as the solid black lines in 
Fig. 2. The decision functions plotted in the second row 
of Fig. 2 are particularly simple: in the shaded regions 
report D if Ej^ is observed and report “reject” or 0 if E-£, 
is observed in the non shaded regions. 

From Fig. 2 it is clear that, as a function of A the op¬ 
timal measurement angle interpolates between the USD 
and the Helstrom measurement. This can be made ex¬ 
plicit as follows. The second branch of Eq. (23), i.e. 
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the risk for the 0-1-A loss, i.e. Eq. (23), as a function of A 
and 9. The dot dashed line at A = 0 corresponds to the 
USD measurement when (fi* = 9. Above the dashed line the 
Helstrom measurement is optimal. The optimal angle has 
been discretized for ploting 


when (j)* = 7r/2, is the Helstrom measurement. To re¬ 
cover the USD measurement we plug A = 0 into Eq. (23) 
gives (j>* = 9, so X = 0 implies the USD measurement. 
However, A = 0 is also a degenerate case where no cost 
is assigned to reporting “reject”. Thus, the risk is also 
minimized by reporting “reject” for any outcome of any 
measurement or, equivelently, not bothering to make the 
measurement and simply reporting “reject”. Recall that 
what we are calling the USD measurement is the one 
which minimizes the probability of obtaining the “reject” 
outcome in the usual paradigm. Here, as expected, the 
USD measurement is approached for A —)• 0. This is also 
when the probablity for reporting “reject” is maximized, 
see Fig. 5 of Sec. IV. 

To complete the example we plot in Fig. 3 the optimal 
measurement angle (ff as a function of A and the angle 
between the states 6 and the z axis. The USD protocol 
corresponds the line at A = 0 while the Helstrom mea¬ 
surement is performed for when (j)* = 7r/2. The area 
where (jf = 7r/2 is approximately half of the parameter 
space, i.e. A> 1(1 —61/2)-|-O(0^), thus even when the loss 
function encourages postselection it is not guaranteed to 
be optimal. 

Other studies of inconclusive state discrimination [27, 
28, 30, 31] concern themselves with the probabilities of 
error and reporting the “reject” result. This avoids the 
question of what to do given the outcome of some mea¬ 
surement. Here we have phrased the problem as a deci¬ 
sion theoretic one where the loss is incurred on the deci¬ 
sions and once that loss is specified, a definitive answer 
can be given. In real applications, it would be unlikely 
that an agent’s decisions are constrained to be determin¬ 
istic functions of measurement operators. Indeed, our 
results imply that loosening that constraint can only de¬ 
crease the agent’s risk if they can not measure at the 
optimal angle for a given A. 


IV. RELATIONSHIP BETWEEN RISK AND 
ERROR AND REJECT PROBABILITIES 

So far we have focused on the decision function and the 
loss function. In this section we connect our approach to 
the previous approaches which focus on tradeoffs between 
reject and error probabilities [31], and rejection thresh¬ 
olds [30]. 

For equal prior probabilities the optimal decision rule 
when measuring at the optimal angle, is particularly sim¬ 
ple: report D if E-£,. Let probability of making the cor¬ 
rect decision be C, the probability of error be E, the 
probability of rejection be R and the probability that a 
piece of data is accepted be A. These probabilities can 
be written explicitly as follows: 

Pr(C|0,A)= ^ Pr(H.)Pr[U,((/.*)|^,], (24a) 

iGfl.2} 

Pr(U|0,A)= ^ Pr{m)Pr[E,{^*)\'^,], (24b) 

Pv{R\e,X)= Y. Pr(H,)Pr[Uo((^*)|'k,], (24c) 

iGfl.2} 

Pr(A|6l, A) = Pr(C|6», A) -b Pr(U|0, A), (24d) 

These probabilities obey Pr(i?) + Pr(C') -I- Pr(i?) = 1 
which implies Pr(A) + Pr(i?) = 1. 

In Fig. 4 we plot these probabilities as a function of the 
angle 9 between the states. A strategy without postselec¬ 
tion adheres to the lines of Fig. 4 when A = 0. Deviating 
from this behavior indicates postselection. Notice that 
as 0 —>■ 0 Pr(i?) —)• 1 for all A except A = 0.5. While, in 
Fig. 5 we plot the error probability and reject probability 
as a function of the rejection threshold. Postselection oc¬ 
curs whenever Pr(A) < 1. Notice that as A approaches 0, 
the probablilty of rejection gets closer to 1 for all values 
of 6». 

In 1970 Chow [34] showed a particularly simple rela¬ 
tionship between the error probabilities and the mini¬ 
mum risk under the optimal decision rule 

= Pt{E\ 9,X) + XPv{R\9,X), (25a) 

dA'Pr(i?|0,A',(/)). (25b) 

Both of these expressions can be visualized graphically, 
see Fig. 6. Prior to our work the expression given in 
Eq. (25) (a) is one of the ways the loss function has been 
explained, see e.g. [31]. It is important that the optimal 
decision rule and measurement angle is used otherwise 
the risk will generally be different to the above risk. 

It turns out that Pr(U) can be derived from Pr(i?) for 
a particular rejection threshold. Chow [34] has shown 
that the Stieltjes integral of A with respect to Pr(i?|0, A) 
is precisely the error probability 

Pr{E\9,X) = - [ X'dPT{R\9,X'). (26) 

Jo 
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- Pr(C') - Pr(£;) — Pr(A) —m—Pr{R) 

- Pr(-) when A = 0.5 




e e 


FIG. 4: The probabilities in Eq. (24) as a function of 

the angle between the states 9. When A = 0.5 it is easy 
to show that Pr(C'|S) = 1 — Pr(i?| 6 ) = (1/2)(1 + sin0), 
Pr(A|0) = 1, and Pr(iZ) = 0 as plotted in the top left plot. 
These lines are the gray lines on the other figures. Generi- 
cally as 0 —>■ 0 probability for reporting “don’t know” indeed 
approaches one except when A = 0.5. When the equality in 
the second branch of Eq. (23) is satished we see the measure¬ 
ment switches from one with an inconclusive outcome to the 
Helstrom measurement i.e. Pr(j4) = 1 and Pr(J?) = 0 and 
Pr(C'| 6 i) = 1 - Pr(£;| 6 l) = (1/2)(1 -f sin 6 »). 




A 


FIG. 6 : The relationship between risk and probability for 
rejection. The rejection probability is plotted as a function 
of the rejection threshold A when 6 = tt/S. Gonsider a rejec¬ 
tion threshold of A = 0.3, given this threshold and the angle 
between the states the expected risk can be computed from 
Eq. (22) to be 7?. « 0.26. Equation 25 (b) shows this equiva¬ 
lent to the (shaded) area under the curve up to the rejection 
threshold. The area under the curve can be decomposed into 
a rectangle with height Pr(i?|0, A) ~ 0.724 and width A = 0.3 
so APr(i?|@, A) ~ 0.2172 the integral given Eq. (27) results in 
Pr(£;| 6 l,A) « 0.0428 and thus = Pr(£;| 6 l, A)-f APr(i?| 6 i, A). 


As noted by Chow, this expression is suggestive of an 
error probability-reject probability tradeoff relation, see 
Fig. 7. If Pr(i?|0, A) is differentiable with respect to A 
then the Stieltjes integral reduces to the Riemann inte¬ 
gral 

PT{E\e,X) = - [ X' 

Jo 

From Eq. (26) and Eq. (27) it is clear that the slope of 
the error-reject tradeoff curve in Fig. 7 is exactly value 
of the rejection threshold. Consequently the tradeoff is 
most effective initially and is less rewarding as the desired 
errror decreases. In Fig. 7 we also see that specifying a 
particular rejection threshold, e.g. Pr(i?) = Q as in [30], 
implies a value for A and Pr(E) (once 9 is fixed). 


dX' 


PT{R\e,X') 


dX'. (27) 


V. THE 0-Ae-Ah loss FUNCTION 

Here we generalize the 0-1-A loss function to the 0-A^;- 
Xr loss function, where Ae is the cost of reporting the 
incorrect decision and Xr is the cost of reporting reject 
-i.e.. 


FIG. 5: The rejection and error probabilities as a function 
of A. When A = 0 the measurement strategy is precisely the 
USD measurement and the rejection probability attains its 
maximum Pr(i?) = cos6. Now consider the values of A for 
which Pr(R) = 0. For example when 9 — tt/S, Pr(J?) = 0 
when A £ [0.4, 0.5]. As A is decreased the probability of reject 
increases and probability of error decreases with diminishing 
returns. 


Al,l = A2,2 = 0, 

Ai ,2 = A 24 = Xe, (28) 

Ao,l = Ao,2 = Xr. 

For the O-Xr-Xr loss function in Eq. (28) the conditional 
risks for decisions i are 

7^[21D] = AE[l-Pr(H2|D)], 

7^[11D] = AE[l-Pr(HilD)], (29) 

7^[01D] =Ae, 
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Ft{R) 


FIG. 7: Error-reject tradeoff curve. In fact the deriva¬ 

tive of Pr(i?) with respect to Pt{R) is A. These curves 
are implicit functions of A. The trade off is not linear in 
the rejection threshold A. This is evident on the line cor¬ 
responding to 6 = tt/S where six crosses corresponding to 
A € [0, 0.1, 0.2, 0.3, 0.4, 0.5] are plotted. 



Report “Reject” 
301 | 



Report Hi Report 412 




never report 'W2 if ■£'i 



0 0 


FIG. 8: Decision regions for the 0-Xe-Xr loss function. In all 
figures the angle between the states is 9 — n/8 and the reject 
loss was chosen to be A_r = 1. The shaded regions should 
be intepreted as report the column heading. In row one, the 
reporting of a hypothesis given the inconclusinve outcome is a 
result of Eq. (30). Evidently, as Xe becomes large the decision 
rule becomes more like unambigous state discrimination. 


The following analysis assumes the same states 
[Eq. (18a)], prior probablities [Pr('Hi) = 1/2], and mea¬ 
surements [Eq. (19)], as before. Of particular interest is 
the case when the measurement outcome is ob¬ 

tained, i.e. D = 0, then the conditional risks are 


FIG. 9: Risk as a function of measurement angle (j) a^nd the 
cost of reporting the wrong decision Xe for the 0 -Ab-Ajj loss 
function. Here 6 = tt/S and the reject loss was chosen to 
be Xr — 1. For Xe < 2.5 we see the optimal measurement 
is the Helstrom measurement and as Xe ^ oo the optimal 
measurement approaches the USD measurement. 


Thus if Xr > Ab/ 2 we should never reject, instead we 
should report either hypothesis, as illustrated in row 1 of 
Fig. 8. In Fig. 8 we have chosen A^j = 1 so that for all 
Ab < 2 we must report either hypothesis to minimize our 
risk. In particular if we perform the a measurement with 
an inconclusive outcome (j) < ii 12 and obtain the incon- 
lusive outcome Eq we should randomly choose between 
reporting "Hi and 'H 2 - For Xr < Xe/2. we find 

r2 if PrCHzjD) > 1 - ^ and Pr/HilD) 
(5(D) = i 1 if PrCHilD) > 1 - ^ and Pr/HalD) . 

I 0 otherwise 

(31) 

In words, the decision rule is as follows: find the largest 
posterior probability; if it is greater than or equal to the 
threshold 1 — report it; if it is less than 1 — report 
“reject”. 

Like the 0-1-A loss function, the O-Ab-A/j loss func¬ 
tion also interpolates between the Helstrom measurement 
and unambiguous state discrimination, as illustrated in 
Fig. 9. Notice, for both loss functions, we did not need 
to “normalize” the loss function or add additional con- 
traints such as Pr(i?) = 0 or Pr(i?) = 0, unlike other 
approaches [31]. 


VI. DISCUSSION 


7 ^[ 2 | 0 ] =\eI2, 
7^[1|0] =Aij/2, 
7^[0|0] =Afl. 


In the ongoing debate about postselection for informa¬ 
tion theoretic tasks in quantum theory, we have given a 
plausible example where postselection is a feature of the 




































































optimal solution. We say plausible because the loss func¬ 
tion on the decisions was not tailored to favor full-blown 
postselection—the solution was not obvious. 

In Sec. Ill we have shown that USD measurements only 
arise in the limit when the cost assigned to discarding 
data is exactly zero, which corresponds to the line A = 0 
for all 0 in Fig. 3. In contrast, the Helstrom measure¬ 
ment appears to be the natural measurement for approx¬ 
imately half of the paramter space A > ^(1 — 0/2). For 
the remainder of the parameter space, i.e. A < i(l—0/2), 
strategies involving postselection (that are not USD) are 
optimal. In Sec. IV we unified three seemingly separate 
approaches, namely the decision theoretic approach (i.e. 
our 0-1-A loss function), the rejection threshold approach 
[30], and the probability tradeoff approach [31]. Section 
V highlighted that the decision function can not simply 
be ignored—in some situations it is better to report an 
answer even if the inconclusive outcome was obtained. 

It is natural to ask what the implications of our anal¬ 
ysis are. In practical situations it could be desirable to 
reduce errors by rejecting some data, but excessive rejec¬ 
tion is required to reduce error to zero. And, at the point 
where the error is zero one can equivalently reject with¬ 
out bothering to perform any experiment, as the cost of 
rejection is also zero. Generally this implies when a loss 
function is specified as conditional on some event being 
successful that this is equivalent to assigning cost to a 
rejection option. Again, if the cost of rejection is zero 
why should you bother to perform the experiment at all? 
We have suggested a sensible approach is to embed a 
postselection protocol into a class of protocols which as¬ 
sign loss for discarding data, this makes clear the price 
of postselection. 

For example, consider offline magic state distillation 
for quantum computation [37]. The success probability 
is relevant for quantifying efficiency (or expected yield in 
Sec. VI. of [38]) of the magic state distillation routine. 
When the success probability for the scheme is too small 
then the overall distillation routine is inefficient, even 
if it performs very well when it does succeed. This is 
generically true in offline state preparation. If costs are 
low, we are happy to wait for some time for a state to be 
prepared. But the cost are not zero, as we actually want 
to make a state and perform a useful task. 

The virtue of the decision theoretic approach is that 
all the assumptions, constraints and figures of merit are 
made explicit at the outset—the rest is derived. Thus, 
within this framework it is quite natural to include new 
constraints and features. For example, if experimen¬ 
tal noise or inaccuracies or constraints are of concern, 
one must include those at the highest level—that is, 
they must be specified in the initial states, POVM, or 
loss function. Questions of robustness or imperfections, 
which plague other approaches, are simply a category 


mistake to ask here. 

A number of open questions remain. The first class 
of questions are about extensions to the specific ideas 
developed in this manuscript. A simple modification is 
when Alice makes collective measurements on N copies 
of jdti) or 14'2). In this case the states look more or¬ 
thogonal because 1 (dtij ^' 2 ) 1^'^ < 1 (^> 1 ] 'I' 2 ) p. Based on 
our results in Fig. 3 we conjecture that the optimal joint 
measurement for the 0-1-A loss function will look closer 
to a Helstrom measurement than the USD measurement. 
The obvious question is: does a bound on the N copy 
risk exist? Ideally the solution would be something like 
the quantum Chernoff bound [39] which bounds the min¬ 
imum error probability asymptotically in N (i.e. the risk 
of the 0/1 loss function). 

The second class of questions are about the role of 
postselection in quantum information tasks. Although 
we have conjured an exotic loss function for which the 
optimal strategy includes postselection, it is not tied ex¬ 
plicitly to an existing operational task. Nevertheless we 
suggest that our decision theoretic approach should be 
taken for any practical state discrimination (or estima¬ 
tion) problem which allows for the possibility of postse¬ 
lection. Extending our approach to parameter estimation 
seems to be the next great challenge. The results in this 
manuscript add weight to our suggested loss function [9] : 
report “reject” and incur loss A for mean squared error 
(MSE) above some threshold and incur the MSE loss be¬ 
low that threshold. 
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